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ABSTRACT 

We present new algorithms for Personalized PageRank es¬ 
timation and Personalized PageRank search. First, for the 
problem of estimating Personalized PageRank (PPR) from 
a source distribution to a target node, we present a new 
bidirectional estimator with simple yet strong guarantees on 
correctness and performance, and 3x to 8x speedup over ex¬ 
isting estimators in experiments on a diverse set of networks. 
Moreover, it has a clean algebraic structure which enables 
it to be used as a primitive for the Personalized PageRank 
Search problem: Given a network like Facebook, a query 
like “people named John,” and a searching user, return the 
top nodes in the network ranked by PPR from the perspec¬ 
tive of the searching user. Previous solutions either score all 
nodes or score candidate nodes one at a time, which is pro¬ 
hibitively slow for large candidate sets. We develop a new 
algorithm based on our bidirectional PPR estimator which 
identifies the most relevant results by sampling candidates 
based on their PPR; this is the first solution to PPR search 
that can find the best results without iterating through the 
set of all candidate results. Finally, by combining PPR sam¬ 
pling with sequential PPR estimation and Monte Carlo, we 
develop practical algorithms for PPR search, and we show 
via experiments that our algorithms are efficient on networks 
with billions of edges. 

Categories and Subject Descriptors 

H.3.3 [Information Search and Retrieval ]: Search pro¬ 
cess; G.2.2 [Graph Theory]: Graph Algorithms 

General Terms 

Algorithms, Performance, Experimentation, Theory 

Keywords 

Personalized Search, Personalized PageRank, Social Net¬ 
work Analysis 
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1. INTRODUCTION 

On social networks, personalization is necessary for re¬ 
turning relevant results for a query. For example, if a user 
searches for a common name like John on a social network 
like Facebook, the results should depend on who is doing the 
search and who their friends are. A good personalized model 
for measuring the importance of a node t to a searcher s is 
Personalized PageRank 7rs(t) [20, 13, 12] - this motivates a 
natural Personalized PageRank Seareh Problem: Given 

• a network with nodes V (each associated with a set of 
keywords) and edges E (possibly weighted and directed), 

• a keyword inducing a set of targets: 

T = {t G R : t is relevant to the keyword} 

• a searching user s E V (or more generally, a distribution 
over starting nodes), 

return the top-k targets ,..., G T ranked by Personal¬ 
ized PageRank 7Ts{ti). 

The importance of personalized search extends beyond so¬ 
cial networks. For example, personalized PageRank can be 
used to rank items in a bi-partite user-item graph, in which 
there is an edge from a user to an item if the user has liked 
that item. This has proven useful on YouTube when recom¬ 
mending videos [5] and on Twitter for suggested users [3, 
12]. On the web graph there is a large body of work on us¬ 
ing Personalized PageRank to rank web pages (e.g. [14, 13]). 
The most clear-cut motivation for our work is for the social 
network name-search application discussed above, which we 
use as a running example in this paper. 

The personalized search problem is difficult because every 
searching user has a different ranking on the target nodes. 
One naive solution would be to precompute the ranking for 
every searching user, but if our network has n users this 
requires 0(n^) storage, which is clearly infeasible. Another 
naive baseline would be to use power iteration [20] at query 
time, but that would take ©(m) computation between the 
search query and response, where m is the number edges, 
which is also clearly infeasible. The challenge we face is 
to create a data structure much smaller than 0{n^) which 
allows us to rank \T\ targets in response to a query in less 
than 0(|T|) time. 

Previous work has considered the problem of personalized 
search on social networks. For example Vieira et. al. [24] 
consider this problem and provide excellent motivation for 
why results to a name-search query should be ranked based 
the friendships of the searching user and the candidate re¬ 
sults. They and others (e.g. [4]) propose to rank results 

by shortest path length. However, this metric doesn’t take 
into account the number of paths between two users: If the 


searcher and two results John A and John B are distance 
3 apart, but the searcher and John A are connected by 100 
length-3 paths while the searcher and John B are connected 
by a single length-3 path, than John A should be ranked 
above John B, yet the shortest distance can’t distinguish 
the two. To the best of our knowledge, no prior work has 
solved the Personalized PageRank search problem using less 
than O(n^) storage and 0{\T\) query time. The reason we 
are able to solve this is by exploiting a new bidirectional 
method of PageRank, introduced in [19] and improved in 
this work. 

Our search algorithm is based on two key ideas. The first 
is that we can find the top target nodes without having to 
consider each separately by sampling a target ti ^ T in pro¬ 
portion to its Personalized PageRank tTs (U). Because the top 
results typically have a much higher personalized PageRank 
than an average result, by sampling we can find the top re¬ 
sults without iterating over all the results. The second idea 
is that the probability of a random walk exactly reaching an 
element in T is often very small, but by pre-computing an 
expanded set of nodes around each target, we can efficiently 
sample random walks until they get close to a target node, 
and then use the pre-computed data to sample targets ti in 
proportion to 7rs{ti). 

There are currently two main limitations to our work. 
First, because we do pre-computation on the set of nodes 
relevant to a query, we need the set of queries to be known 
in advance, although in the case of name search we can sim¬ 
ply let the space of queries be the set of all first or last names. 
Second, the pre-computed storage is significant; for name- 
search it is O (nyffiJ) to achieve query running time 0{^/m), 
where n is the number of nodes and m is the number of 
edges. However, large graphs tend to be sparse, so this is 
still much smaller than O (n^) and is less storage than any 
prior solution to the Personalized PageRank Search prob¬ 
lem. Also, pre-computation doesn’t need to be done for all 
queries: for queries with small or very large target sets we 
describe alternative algorithms which do not require pre- 
computation. These alternatives also overcome the limita¬ 
tion on queries being known in advance. 

Contributions: To summarize, in this work we present: 

• A new bidirectional PageRank estimator, Bidirectional- 

PPR (section 3), which has the following features: 

— Simple analysis: We combine a simple linear-algebraic 
invariant with standard concentration bounds. The 
new analysis also allows generalizations to arbitrary 
Markov Chains, as done in [6]. 

— Easy to implement: The complete algorithm is only 18 
lines of pseudo-code. 

— Signifieant empirieal speedup: For a given accuracy, it 
executes 3x-8x faster than the fastest previous algo¬ 
rithm, FAST-PPR [19], on a diverse set of networks. 

— Simple linear strueture: As shown in section 4.1, the 
estimates are a simple dot-product between a forward 
vector and a reverse vector y* - this enables the 
development of PPR samplers. 

— Parallelizability: Because the estimate is a dot-product, 
the precomputed vectors can be sharded across many 
servers, and the estimation algorithm can be naturally 
distributed, as shown in [11]. 

• Two new solutions to the Personalized PageRank Search 

problem - BiPPR-Grouped and BiPPR-Sampling. Given 

any set of targets T: 


— BiPPR-Precomp-Grouped precomputes and stores the re¬ 
verse vectors y^,t after grouping them by their co¬ 
ordinates. This exploits the natural sparsity of these 
vectors to speed-up the computation of the PPR esti¬ 
mates at runtime. 

— BiPPR-Precomp-Sampling samples nodes t G T propor¬ 
tional to their PPR 7rs{t). Now since PPR values are 
usually highly skewed, this serves as a good proxy for 
finding the top k search results. 

• Extensive simulations on the Twitter-2010 network to test 
the scalability of our algorithms for PPR-search. Our ex¬ 
periments demonstrate the trade-off between storage and 
runtime, and suggest that we should use a combination 
of methods, depending on the size of the set of targets T 
induced by the keyword. 

2. PRELIMINARIES 

We are given a graph G = with n nodes and m 

edges. Define the out-neighbors of a node u by — 

{v : {u,v) G E} and let d'^'^^{u) = {u)\; define 

and d^^ (u) similarly. Define the average degree of nodes d = 
If the graph is weighted, for each {u, v) ^ E there is some 
positive weight Wu,v', otherwise we define Wu,v = dou\(u) 
all (u, v) G E. For simplicity we assume the weights are 
normalized such that for all u, Wu,v = 1- 

The personalized PageRank from source distribution a to 
target node t can be defined using linear algebra as the solu¬ 
tion to the equation tTo- = TVa {aa {1 — a)W), or equivalently 
defined using random walks 

7Ta{t) = Pr[a random walk starting from s ~ cr 
of length ~ geometric(a) stops at t] 

as shown in [2]. For concreteness, in this paper we often 
assume a — eg for some single node s (meaning the random 
walks always start at a single node s), but all results extend 
in a straightforward manner to any starting distribution a. 

Personalized PageRank was first defined in the original 
PageRank paper [20]. For more on the motivation of Per¬ 
sonalized PageRank, see [13] and the survey [10]. 

3. PAGERANK ESTIMATION 

In this section, we present our new bidirectional algorithm 
for PageRank estimation. We first develop the basic al¬ 
gorithm along with its theoretical performance guarantees; 
next, we outline some extensions of the basic algorithm; fi¬ 
nally, we conclude the section with simulations demonstrat¬ 
ing the efficiency of our technique. 

The Bidirectional-PPR Algorithm 

At a high level, our algorithm estimates TTs{t) by first 
working backwards from t to find a set of intermediate nodes 
‘near’ t and then generating random walks forwards from s 
to detect this set. 

The reverse work from t is done via the Approx-Contri- 
butions algorithm (see Algorithm 1) of Andersen et. al. [1], 
that, given a target t and a desired additive error-bound 
r’max, produces estimates p^{s) of the PPR 7rs(t) for every 
start node s. More specifically, the Approx-Contributions 
algorithm produces two non-negative vectors p^ G ME and 
rt ^ satisfy the following invariant (Lemma 1 in 



[ 1 ]) 


ns{t) =p\s) + '^TTs{v)r*{v). ( 1 ) 

Approx-Contributions terminates once each residual value 
r\v) < rmax; now, viewing 7rs(u)r^(u) as an error 

term, Andersen et al. observe that p*{s) estimates 7rs(t) up 
to a maximum additive error of rmax- 

Our Bidirectional-PPR algorithm is based on the obser¬ 
vation that in order to estimate 7Ts{t) for a particular (s,t) 
pair, we can boost the accuracy by sampling and adding 
the residual values r^{v) from nodes v which are sampled 
from TVs- To see this, we first interpret Equation (1) as an 
expectation: 

TTsC^) = p‘(s) + y (f)]. 

Now, since max„ r*(t)) < rmax, the expectation [r*(r)] 

can be efficiently estimated using Monte Carlo. To do so, we 
generate w = random walks of length Geometric{a) 

from start node s; here c is a parameter which depends 
on the desired accuracy, rmax is the maximum residual af¬ 
ter running Approx-Contributions, and 6 is the minimum 
PPR value we want to accurately estimate. Let Vi be the 
final node of the random walk; note that PT[Vi — v] — 
Tis{v). Let Xi = r^{Vi) denote the residual from the final 
node of the zth random walk, and X = — i Xi. Then 
Bidirectional-PPR returns as an estimate of 7Ts{t): 

^s{t) = p\s) + X 

The complete pseudocode is given in Algorithm 2. 


Algorithm 1 Approx-Contributions (G, a, t, rmax) [1] 
Inputs: graph G with edge weights Wu,v^ teleport probabil¬ 
ity o, target node t, maximum residual rmax 
1: Initialize (sparse) estimate-vector = 0 and (sparse) 

residual-vector rt — ct (i.e. rt{v) — 1 v — t] else 0) 

2: while G E s.t.rt{v) > rmax do 
3: for u G do 

4: rt{u) += (1 - a)wu,vrt(v) 

5: end for 

6: Pt{v) += Oirt{v) 

7: rtiy) = 0 

8: end while 
9: return (pt,rt) 


Algorithm 2 Bidirectional-PPR(s, t, (5) 

Inputs: graph G, teleport probability o, start node s, tar¬ 
get node t, minimum probability (5, accuracy parameter 
c (in our experiments we use c = 7) 

1: Choose rmax = Cbalance/\/^), whcrC Cbalance IS tUUCd 
to balance forward and reverse work. (For greater effi¬ 
ciency, use the balanced version described in Section 3.) 

2: {pt,rt) = Approx-Contributions(t, rmax, Q;) 

3: Set number of walks w = crmax/(^ (cf. Theorem 1) 

4: for index i G [w] do 

5: Sample a random walk starting from s (sampling a 

start from s if s is a distribution), stopping after each 
step with probability a; let Vi be the endpoint 

6: SetAi=rt(ri) 

7: end for 

8: return 9s{t) = pt{s) -f (l/w) EzgH 


assumption rmax > ^ is easily satisfied, as typically S = 
O (^) and rmax = ^ (^)- 

Proof. As shown in Algorithm 2, we will average over 


walks, where c is a parameter we choose later. Each walk is 
of length Geometric{a), and we denote Vi as the last node 
visited by the walk, so that Vi ^ tTs- Let Xi = r^{Vi). 
The estimate returned by Bidirectional-PPR is 

1 

^s{t) = p\s) H— 

i=l 

First, from Equation (1), we have that E[^s(t)] = 7rs(t). 
Moreover, Approx-Contributions guarantees that for all v, 
r*(v) < rmax, and so each Xi is bounded in [0,rmax]. Before 
applying Chernoff bounds, we rescale Xi by defining Yi = 
-2—Ai G [0,1], and we define Y = ZZi 
We will show concentration of the estimates via the fol¬ 
lowing two Chernoff bounds (see Theorem 1.1 in [?]): 

1. P[|y-E[y]| > eE[y]] < 2exp(-^E[y]) 

2. For any b > 2eE[y],P[y > b] < 2“*’ 


Accuracy Analysis 

We first prove that Bidirectional-PPR returns an esti¬ 
mate with the desired accuracy with high probability: 

Theorem 1. Given start node s (or source distribution 
a), target t, minimum PPR 6, maximum residual rmax > 
relative error e < 1, and failure probability Pfaii, Bi¬ 
directional-PPR outputs an estimate ^s(t) such that with 
probability at least 1 — pfau the following hold: 

• If7Ts{t) > S: |7rs(^) -^s{t)\ < e7Ts{t). 

• If TTs it) < S: ks(t) — ^s(t)i < 2e^. 

The above result shows that the estimate 7rs(t) can be used 
to distinguish between ‘significant’ and ‘insignificant’ PPR 
pairs: for pair (s,t). Theorem 1 guarantees that if 7rs{t) > 
, then the estimate is greater than (1 + 2e)6, whereas 
if 7Ts{t) < (5, then the estimate is less than (1 + 2e)S. The 


We perform a case analysis based on whether E[Ai] > S or 
E[W] < (5. 

First suppose E[Ai] > S. This implies that 7Ts{t) > S 
so we will prove a relative error bound of e. Now we have 
E[y] = = fE[Ai] > c, and thus: 


P[|5f«(b - 7r«(b| > ensit)] < P[|X - E[Xi]| > eE[Xi]] 
= p[|y-E[y]| >eE[y]] 


< 2 exp 



< 2 exp 



^ Pfaii, 


where the last line holds as long as we choose 












Suppose alternatively that E[Xi] < S. Then 


]n,(t) - > 2eS] =P[|X-E[Xi]| > 2e5] 


|r-E[y]| > - 2e5 


<: 


Y > —2e5 


At this point we set b = 2eS = 2ec and apply the second 

^max 

Chernoff bound. Note that E[y] = |E[Ai] < c, and hence 
we satisfy b > 2eE[y]. The second bound implies that 


P[|7rs(t) - 7Ts{t)\ > 2e5] <2 ^ < pfaii (2) 


as long as we choose c such that: 


c>-log. 


1 

Pfail 


If 7rs(t) < (5, then equation 2 completes our proof. 

The only remaining case is when 7Vs{t) > S but E[Ai] < S. 
This implies that p^{s) > 0 since 7Ts{t) = P^{s) + E[Ai]. 
In the Approx-Contributions algorithm when we increase 
7Ts{t), we always increase it by at least armax, so we have 
p^{s) > armax. We have that 


\^s{t) - 7Ts{t)\ ^ \^s{t) - 7Ts{t)\ 

7Ts{t) ~ ar max 


By assumption, < e, so by equation 2, 


\^s{t) - 7rs{t)\ 

TVs (t) 


> e 


^ Pfail 


The proof is completed by combining all cases and choos¬ 
ing c = -T In ( ). We note that the constants are not 

^ \ Pfail / 


optimized; in experiments we find that c = 7 gives mean 
relative error less than 8% on a variety of graphs. □ 


Running Time Analysis 

The runtime of Bidirectional-PPR depends on the target 
t: if t has many in-neighbors and/or large global PageRank 
7r(t), then the running time will be slower than for a ran¬ 
dom t. Theorem 1 of [1] states that Approx-Contributions 
(G,a,t,rmax) performs pushback operations, and the 

... , p 1 • 

exact running time is proportional to the sum of the m- 
degrees of all the nodes where we pushback from. In the 
worst case, we might have = ©(n) and Bidirectional- 

PPR takes ©(n) time. However, for a uniformly chosen target 
node, we can prove the following: 

Theorem 2. For any start node s (or source distribu¬ 
tion a), minimum PPR S, maximum residual rmax, relative 
error e, and failure probability Pfaii, 'If the target t is chosen 
uniformly at random, then Bidirectional-PPR has expected 
running time 


In contrast, the running time for Monte-Carlo to achieve 
the same accuracy guarantee is O ^ and the 

running time for Approx-Contributions is O The 

fastest previous algorithm for this problem, the FAST-PPR 
algorithm of [19], has an average running time bound of 


/d yiog (l/p/a,i) 
\ S ae 


{p\li V^ °^‘iog(i/(i-a) F^) for uniformly chosen targets. 

The running time bound of Bidirectional-PPR is thus asymp¬ 
totically better than FAST-PPR, and in experiments the con¬ 
stants required for the same accuracy are smaller, making 
Bidirectional-PPR is 3 to 8 times faster on a diverse set of 
graphs. 

Proof. In [18], it is proven that for a uniform random t, 
Approx-Contributions runs in average time —-— where d is 
the average degree of a node. On the other hand, from Theo¬ 
rem 1, we know that we need to generate O In (1/pfaii)) 

random walks, each of which can be sampled in average time 

l/a. Finally, we choose rmax = a\/ in( 2 /pf -o minimize 
our running time bound and get the claimed result. □ 

Extensions Bidirectional-PageRank extends naturally to 
generalized PageRank using a source distribution a rather 
than a single start node - we simply sample an independent 
starting node for each walk, and replace pt{s) with the ex¬ 
pected value of pt{s) when s is sampled from the starting 
distribution. 

The dynamic runtime-balancing method proposed in [19] 
can improve the running time of Bidirectional-PageRank in 
practice. In this technique, rmax is chosen dynamically in or¬ 
der to balance the amount of time spent by Approx-Contri¬ 
butions and the amount of time spent generating random 
walks. To implement this, we modify Approx-Contributions 
to use a priority queue in order to always push from the node 
V with the largest value of rt{v). We also change the while 
loop so that it terminates when the amount of time spent 
achieving the current value of rmax first exceeds the pre¬ 
dicted amount of time required for sampling random walks, 
Cwaik • c • where Cwaik is the average time it takes to 

sample a random walk. For full pseudocode, see [17]. 
Experimental Validation 

We now compare Bidirectional-PPR to its predecessor 
algorithms (namely: FAST-PPR [18], Monte Carlo [2, 9] and 
Approx-Contributions [1]). The experimental setup is iden¬ 
tical to that in [18]; for convenience, we describe it here in 
brief. We perform experiments on 6 diverse, real-world net¬ 
works: two directed social networks (Pokec (31M edges) and 
Twitter-2010 (1.5 billion edges)), two undirected social net¬ 
work (Live-Journal (69M edges) and Orkut (117M edges)), 
a collaboration network (dblp (6.7M edges)), and a web- 
graph (UK-2007-05 (3.7 billion edges)). Since all algorithms 
have parameters that enable a trade-off between running 
time and accuracy, we first choose parameters such that the 
mean relative error of each algorithm is approximately 10%. 
For bidirectional-PPR, we find that setting c = 7 (i.e., gener¬ 
ating 7 • random walks) results in a mean relative error 
less than 8% on all graphs; for the other algorithms, we use 
the settings determined in [18]. We then repeatedly sample 
a uniformly-random start node s G U, and a random target 
t G T sampled either uniformly or from PageRank (to em¬ 
phasize more important targets). For both Bidirectional- 
PPR and FAST-PPR, we used the dynamic-balancing heuristic 
described above. The results are shown in Figure 1. 

Note that Bidirectional-PPR is 3 to 8 times faster than 
FAST-PPR across all graphs. In particuar, Bidirectional- 
PPR only needs to sample random walks, while FAST- 

PPR needs 350^^^^ walks to achieve the same mean relative 
error. This is because Bidirectional-PPR is unbiased, while 
FAST-PPR has a bias from Approx-Contributions. 
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Figure 1: Average running-time (on log-scale) for different networks. We measure the time required for 
estimating PPR values 7rs(t) with threshold S = ^ for 1000 (s,t) pairs. For each pair, the start node is sampled 
uniformly, while the target node is sampled uniformly in Figure 1(a), or from the global PageRank distribution 
in Figure 1(b). In this plot we use teleport probability a = 0.2. 


4. PERSONALIZED PAGERANK SEARCH 

We now turn from Personalized PageRank estimation to 
the Personalized PageRank search problem: 

Given a start node s (or distribution a) and a query q 
whieh filters the set of all targets to some list 
T = {ti} C V, return the top-k targets ranked by 7Ts[ti]. 

We consider as baselines two algorithms which require no 
pre-computation. They are efficient for certain ranges of 
|T|, but our experiments show they are too slow for real¬ 
time search across most values of |T|: 

• Monte-Carlo [2, 9]: Sample random walks from s, and 

hlter out any walk whose endpoint is not in T. If we 
desire Us samples, this takes time O (ns/7Ts[T]), where 
7rs[T] := W probability that a random walk 

terminates in T. This method works well if T is large, but 
in our experiments on Twitter-2010 it takes minutes per 
query for \T\ = 1000 (and hours per query for \T\ = 10). 

• Bidirectional-PPR: On the other hand, we can estimate 
7rs[t] to each t ^ T separately using Bidirectional-PPR. 

This has an average-case running time O ^|T| where 

5k is the PPR of the k^^ best target. This method works 
well if T is small, but is too slow for large T; in our ex¬ 
periments, it takes on the order of seconds for \T\ < 100, 
but more than a minute for |T| = 1000. 

If we are allowed pre-computation, then we can improve 
upon Bidirectional-PPR by precomputing and storing a re¬ 
verse vector from all target nodes. To this end, we hrst ob¬ 
serve that the estimate tTs [t] can be written as a dot-product. 
Let TVs be the empirical distribution over terminal nodes due 
to w random walks from s (with w chosen as in Theorem 
1); we dehne the forward veetor Xg G to be the concate¬ 
nation of the basis vector eg and the random-walk terminal 
node distribution. On the other hand, we dehne the reverse 
veetor y^ G , to be the concatenation of the estimates 
and the residuals r^. Formally, dehne 

Xg = (eg^TTg) G y^ = G (3) 

Now, from Algorithm 2, we have 

TTs[i\ = {xs,y^). (4) 

The above observation motivates the following algorithm: 


• BiPPR-Precomp: In this approach, we hrst use Approx- 
Contributions to pre-compute and store a reverse vector 
y* for each t G R. At query time, we generate random 
walks to form the forward vector Xg] now, given any set 
of targets T, we compute \T\ dot-products {xg,y^), and 
use these to rank the targets. This method now has an 
worst-ease running time O ^|T| ^/d/Sk^. In practice, it 
works well if T is small, but is too slow for large T. In 
our experiments (doing 100,000 random walks at runtime) 
this approach takes around a second for \T\ < 30, but this 
climbs to a minute for \T\ = 10, 000. 

The BiPPR-Precomp approach is faster than Bidirectional- 
PPR (at the cost of additional precomputation and storage), 
and also faster than Monte-Carlo for small sets T, but it is 
still not efficient enough for real-time personalized search. 
This motivates us to hnd a more efficient algorithm that 
scales better than Bidirectional-PPR for large T, yet is 
fast for small |T|. In the following sections, we propose 
two different approaches for this - the hrst based on pre¬ 
grouping the precomputed reverse-vectors, and the second 
based on sampling target nodes from T according to PPR. 
For convenience, we hrst summarize the two approaches: 

• BiPPR-Precomp-Grouped: Here, as in BiPPR-Precomp, we 

compute an estimate to each t G T using Bidirectional- 
PPR. However, we leverage the sparsity of the reverse vec¬ 
tors y^ = by hrst grouping them in a way we will 

describe. This makes the dot-product more efficient. This 
method has a worst-ease running time of O ^|T| y/d/Sk ^, 
and in experiments we hnd it is much faster than BiPPR- 
Precomp. For our parameter choices its running time is 
less than 250ms across the range of \T\ we tried. 

• BiPPR-Precomp-Sampling: We again hrst pre-compute the 
reverse vectors y^. Next, for a given target t, we dehne 
the expanded target-set Tt = {u G [2n]|y^[u] / 0}, i.e., 
the set of nodes with non-zero reverse vectors from t. At 
run-time, we now sample random walks forward from s to 
nodes in the expanded target sets. Using these, we create 
a sampler in average time O {rmax/5k) (where as before 5k 
is the k^^ largest PPR value 7Vg[tk]), which samples nodes 
t E T with probability proportional to the PPR Tig [t ]. We 
describe this in detail in Section 4.2. Once the sampler 
has been created, it can be sampled in 0(1) time per sam- 



















pie. The algorithm works well for any size of T, and has 
the unique property that in can identify the top-/c target 
nodes without computing a score for all \T\ of them. For 
our parameter choice its running time is less than 250ms 
across the range of \T\ we tried. 

We note here that the case k = 1 (i.e., for finding the 
top PPR node) corresponds to solving a Maximum Inner 
Product Problem. In a recent line of work, Shrivastava 
and Li [21, 22] propose a sublinear time algorithm for this 
problem based on Locality Sensitive Hashing; however, their 
method assumes that there is some bound U on || 2/^||2 ^md 
that maxt(xs,y^) is a large fraction of U. In personalized 
search, we usually encounter small values of maxt(a:s,y^) 
relative to maxljy^H^ - finding an LSH for Maximum In¬ 
ner Product Search in this regime is an interesting open 
problem for future research. Our two approaches bypass 
this by exploiting particular structural features of the prob¬ 
lem - BiPPR-Precomp-Grouped exploits the sparsity of the 
reverse vectors to speed up the dot-product, and BiPPR- 
Precomp-Sampling exploits the skewed distribution of PPR 
scores to find the top targets without even computing full 
dot-products. 

4.1 Bidirectional-PPR with Grouping 

In this method we improve the running-time of BiPPR- 
Precomp by pre-grouping the reverse vectors corresponding 
to each target set T. Recall that in BiPPR-Precomp, we 
first pre-compute reverse vectors G using 

Approx-Contributions for each t. At run-time, given s, we 
compute forward vector Xs = (es,7rs) by generating suffi¬ 
cient random-walks, and then compute the scores {xs,y^) 
for t E T. Our main observation is that we can decrease the 
running time of the dot-products by pre-grouping the vectors 
y^ by coordinate. The intuition behind this is that in each 
dot product ^yXs[v]y^[v]^ the nodes v where Xs[v] ^ 0 of¬ 
ten don’t have y^[v] ^ 0, and most of the product terms are 
0. Hence, we can improve the running time by grouping the 
vectors y* in advance by coordinate v. Now, at run-time, 
for each v such that Xs[v] ^ 0, we can efficiently iterate over 
the set of targets t such that y^[v] ^ 0. 

An alternative way to think about this is as a sparse 
matrix-vector multiplication Y^Xg after we form a matrix 
whose rows are y^ for t E T. This optimization can then 
be seen as a sparse column representation of that matrix. 


Algorithm 3 BiPPRGroupedPrecomputation(T, rmax) 
Inputs: Graph G, teleport probability o, target nodes T, 
maximum residual rmax 

1: z E- empty hash map of vectors such that for any r, z[v] 
defaults to an empty (sparse) vector in 

2: for t G T do 

3: Gompute y^ = {Pt,rt) E R^^^^ via Approx-Contri- 

butions(G, o, t, rmax) 

4: for V E [2 |y|] such that yt[v] > 0 do 

5: z[v][t] = y*[v] 

6: end for 

7: end for 
8: return z 


We refer to this method as BiPPR-Precomp-Grouped; the 
complete pseudo-code is given in Algorithm 4. The correct¬ 
ness of this method follows again from Theorem 1 . In exper¬ 
iments, this method is efficient for T across all the sizes of T 


Algorithm 4 BiPPRGroupedRankTargets(s, rmax, ^) 
Inputs: Graph G, teleport probability a, start node s, max¬ 
imum residual rmax, hash map of reverse vectors 
grouped by coordinate 

1: Set number of walks w = (In experiments we 

found c = 20 achieved precision@3 above 90%.) 

2: Sample w random-walks of length Geometric{a) from s; 

compute TVs [r] = fraction of walks ending at node v 
3: Gompute Xg = (es,'ms) G R^'^' 

4: Initialize empty map score from H to R 
5: for V such that Xg[v] > 0 do 
6: for t such that 2 ;[r][t] > 0 do 

7: scoTe{t) -\-= Xg[v]zv[t] 

8: end for 

9: end for 

10: Return T sorted in decreasing order of score 


we tried, taking less than 250 ms even for \T\ = 10, 000. The 
improved running time of BiPPR-Precomp-Grouped comes at 
the cost of more storage compared to BiPPR-Precomp. In the 
case of name search, where each target typically only has a 
first and last name, each vector y^ only appears in two of 
these pre-grouped structures, so the storage is only twice 
the storage of BiPPR-Precomp. On the other hand if a tar¬ 
get t contains many keywords, y* will be included in many 
of these pre-grouped data structures, and storage cost will 
be significantly greater than for BiPPR-Precomp. 

4.2 Sampling from Targets Matching a Query 

The key idea behind this alternate method for PPR-search 
is that by sampling a target t in proportion to its PageRank 
we can quickly find the top targets without iterating over all 
of them. After drawing many samples, the targets can be 
ranked according to the number of times they are sampled. 
Alternatively a full Bidirectional-PPR query can be issued 
for some subset of the targets before ranking. This approach 
exploits the skewed distribution of PPR scores in order to 
find the top targets. In particular, prior empirical work has 
shown that on the Twitter graph, for each fixed s, the values 
TTs [t] follow a power law [3] . 

We define the PPR-Search Sampling Problem as follows: 

Given a source distribution s and a query q which filters 
the set of all targets to some list T = {ti} C V, sample a 
target ti with probability p[ti] = 

We develop two solutions to this sampling problem. The 
first, in 0{w) = G(rmax/(^fc) time, generates a data struc¬ 
ture which can generate an arbitrary number of independent 
samples from a distribution which approximates the correct 
distribution. The second can generate samples from the ex¬ 
act distribution TVg [ti ], and generates complete paths from s 
to some t E T, but requires time G(rmax/7rs[T]) per sam¬ 
ple. Because the approximate sampler is more efficient, we 
present that here and defer the exact sampler to [17]. 

The BiPPR-Precomp-Sainpling Algorithm 

The high level idea behind our method is hierarchical sam¬ 
pling. Recall that the start node s has an associated forward 
vector Xg — (e^, tTs), and from each target t we have a reverse 
vector y*] the PPR-estimate is given by 7Tg[t] ~ (xg,y*). 










Thus we want to sample t E T with probability: 


pIA 


{xs,y*) 

y^) 


We will sample t in two stages: first we sample an interme¬ 
diate node V e V with probability: 






Following this, we sample t E T with probability: 


Pvlt] 


y>] 


It is easy to verify that p[t] = Figure 2 

shows how the sampling algorithm works on an example 
graph. The pseudo-code is given in Algorithm 5 and Algo¬ 
rithm 6 . 



Figure 2: Search Example: Given target set T = 
for each target ti we have drawn the ex¬ 
panded target-set, i.e., nodes v with positive resid¬ 
ual y^^[v]. Prom source s, we sample three random 
walks, ending at nodes a, 5, and c. Now suppose 
y*^{b) = 0.64,y^^(c) = 0.4,y^2(c) = 0.16, and y*^{c) = 0.16 
— note that the remaining residuals are 0. Then 
we have y^(a) = 0,y^{b) = 0.64 and y^{c) = 0.72, 
and consequently, the sampling weights of (a, b, c) are 
(0,0.213,0.24). Now, to sample a target, we first sam¬ 
ple from {a,b,c} in proportion to its weight. Then if 
we sample 6, we always return ti; if we sample c, we 
sample (ti,t 2 ,t 3 ) with probability (5/9, 2/9, 2/9). 

Note that we assume that some set of supported queries 
is known in advance, and we first pre-compute and store 
a separate data-structure for each query Q (i.e., for each 
target-set T = {t G F : t is relevant to Q}). In addition, 
we can optionally pre-compute w random walks from each 
start-node s, and store the forward vector Xs, or we can 
compute Xs at query time by sampling random walks. 
Running Time: For a small relative error for targets with 
7 rs[t] > (5, we use w = crma^/S walks, where c is chosen 
as in Theorem 1. The support of our forward sampler is 
at most w so its construction time is 0{w) using the alias 
method of sampling from a discrete distribution [25], [15, 
section 3.4.1]. Once constructed, we can get independent 
samples in 0(1) time. Thus the query time to generate Us 
samples is O {crma^/S + ris). 


Algorithm 5 SamplerPrecomputationForSet(T, rmax) 
Inputs: Graph O, teleport probability o, target-set T, 
maximum residual rmax 

1: for t G T do 

2: Compute y* = (p^?'^^) ^ via Approx-Contri- 

butions( 0 , o, t, rmax) 

3: end for 

4: Compute y'^ = EtgT 

5: for V eV such that y'^[v\ > 0 do 

6 : Create samplerwhich samples t with probability 

Pv\t] (For example, using the alias sampling method 
[25], [15, section 3.4.1]). 

7: end for 

8 : return (y^, {sampler^}) 


Algorithm 6 SampleAndRankTargets(s, rmax, {sampler^}) 
Inputs: Graph O, teleport probability o, start node s, max¬ 
imum residual rmax, reverse vectors intermediate- 
node-to-target samplers {sampler^}. 

1: Set number of walks w = c . In experiments 
we found c = 20 achieved precision@5 above 90% on 
Twitter- 2010 . 

2 : Set number of samples Us (We use Us = w) 

3: Sample w random walks from s and let ng be the em¬ 
pirical distribution of their endpoints; compute forward 
vector Xs = (es,:^s) G 

4: Create sampler^ to sample v G [2 |y|] with probability 
p's[v], i.e., proportional to Xs[v]y'^[v] 

5: Initialize empty map score from 1/ to N 
6: for j G [0, Us — 1] do 
7: Sample v from sampler^ 

8 : Sample t from sampler^ 

9: Increment score(t) 

10: end for 

11: Return T sorted in decreasing order of score 


Accuracy: BiPPR-Precomp-Sampling does not sample ex¬ 
actly in proportion to tTs; instead, the sample probabilities 
are proportional to a distribution tTs satisfying the guarantee 
of Theorem 1. In particular, for all targets t with 7Ts[t] > S, 
this will have a small relative error e, while targets with 
7 rs[t] < S will likely be sampled rarely enough that they 
won’t appear in the set of top-k most sampled nodes. 
Storage Required: The storage requirements for BiPPR- 
Precomp-Sampling (and for BiPPR-Precomp-Grouped) de¬ 
pends on the distribution of keywords and how rmax is cho¬ 
sen for each target set. For simplicity, here we assume a 
single maximum residual rmax across all target sets, and as¬ 
sume each target is relevant to at most 7 keywords. For 
example, in the case of name search, each user typically has 
a first and last name, so 7 = 2 . 

Theorem 3. Let graph G, minimum-PPR value S and 
time-spaee trade-off parameter rmax be given, and suppose 
every node eontains at most 7 keywords. Then the total 
storage needed for BiPPR-Precomp-Sampling to eonstruet a 
sampler for any souree node (or distribution) s and any set 

of targets T eorresponding to a single keyword is O ( ). 

\ GtPrnax J 

We can choose rmax to trade-off this storage requirement 
with the running time requirement of O (crmax/(^) - for ex- 












ample, we can set both the query running-time and per-node 
storage to y/cyd/ S where d = m/n is the average degree. 
Now for name search 7 = 2 , and if we choose (5 = - and 
a = 0 ( 1 ), the per-query running time and per-node storage 
is 0 (v^)- 

Proof. For each set T corresponding to a keyword, and 
each t G T, we push from nodes v until for each v, r^[v] < 
r’max. Each time we push from a node v, we add an entry 
to the residual vector of each node u G Af^'^{v), so the space 
cost is Each time we push from a node v, we increase 

the estimate p^[v] by ar^[v] > armax, and < 

= 7^v[T] so V can be pushed from at most 
times. Thus the total storage required is 

d^^{v)(^ of times v pushed) < d^^{v) ( 5 ) 


for this keyword is bounded by: 

w = WIC 2 

'^max Fmax U) 

vEV 

Note that this is independent of 7t[T]. There is still a de¬ 
pendence on |T|, which is natural since for larger T there 
are more nodes which make it harder to find the top-k. Eor 
jd — 0.77 , the rate of growth, is fairly small, and in 

particular is sublinear in |T|. 

Dynamic Graphs: So far we have assumed that the graph 
and keywords are static, but in practice they change over 
time. When a keyword is added to some node T, the node’s 
reverse vector needs to be added to the sampling data 
structure for that keyword. When an edge is added, the 
residual values need to be updated. We leave the extension 
to dynamic graphs to future work. 


Let T be the set of all target sets (one target set per 
keyword). Then the total storage over all keywords is 


E E E ^ E E 
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Adaptive Maximum Residual: One way to improve the 
storage requirement is by using larger values of rmax for 
target sets T with larger global PageRank. Intuitively, if T 
is large, then it’s easier for random walks to get close to T, 
so we don’t need to push back from T as much as we would 
for a small T. We now formalize this scheme, and outline the 
savings in storage via a heuristic analysis, based on a model 
of personalized PageRank values introduced by Bahmani et 
al. [3]. 

Eor a fixed s, we assume the values 7rs[r'] for all r' G P 
approximately follow a power law with exponent /3. Empiri¬ 
cally, this is known to be an accurate model for the Twitter 
graph - Bahmani et al. [3] find that the mean exponent for 
a user is = 0.77 with standard deviation 0.08. To analyze 
our algorithm, we further assume that tTs restricted to T 
also follows a power law, i.e.: 

T^s[ti] = ( 6 ) 


Suppose we want an accurate estimate of tTs [ti] for the top- 
k results within T, so we set 6k = 7Ts[tk]- Erom Theorem 1, 
the number of walks required is: 


w = c 


F max 
6k 


C2 


^s[T] 


where C 2 = Vc/{1 — j3). If we fix the number of walks as 
w, then we must set rmax = 'W'Ks\T]/{c 2 |t 7^). Also, for a 
uniformly random start node s, we have E[ 7 rs[T]] = 7 r[T] (the 
global PageRank of T). This suggests we choose rmax(T) for 
set T as: 

=Tg, (7) 

C2 \T\ 

Going back to equation (5), suppose for simplicity that the 
average dd^{v) encountered is d. Then the storage required 


4.3 Experiments 

We conduct experiments to test the efficiency of these 
personalized search algorithms as the size of the target set 
varies. We use one of the largest publicly available social 
networks, Twitter-2010 [16] with 40 million nodes and 1.5 
billion edges. Eor various values of |T|, we select a target 
set T uniformly among all sets with that size, and compare 
the running times of the four algorithms we propose in this 
work, as well as the Monte Carlo algorithm. We repeat this 
using 10 random target sets and 10 random sources s per 
target set, and report the median running time for all al¬ 
gorithms. We use the same target sets and sources for all 
algorithms. 

Parameter Choices: Because all five algorithms have pa¬ 
rameters that trade-off running time and accuracy, we choose 
parameters such that the accuracy is comparable so we can 
compare running time on a level playing field. To choose a 
concrete benchmark, we chose parameters such that the pre- 
cision@3 of the four algorithms we propose are consistently 
greater than 90% for the range of \T\ we used in experi¬ 
ment. We chose parameters for Monte-Carlo so that our 
algorithms are consistently more accurate than it, and its 
precision@3 is greater than 85%. In the full version we plot 
the precision@3 of the algorithms for the parameters we use 
when comparing running time. 

We used 6 = iisitk) where Hsitk) is estimated using Eqn. 
6 , using /c = 3, power law exponent /3 = 0.77 (the mean value 
found empirically on Twitter), and assuming 'Ks{T) = ^ 
(the expected value of 7Ts{T) since T is chosen uniformly 
at random). Then we use Equation 7 to set rmax, using 
c = 20 and two values of w, 10,000 and 100,000. We used 
the same value of rmax for BiPPR-Precomp, BiPPR-Precomp- 
Grouped, and BiPPR-Precomp-Sampling. Eor Monte-Carlo, 
we sampled ^ walks^. 

Results: Eigure 3 shows the running time of the five algo¬ 
rithms as \T\ varies for two different parameter settings in 
the trade-off between running time and precomputed storage 
requirement. Notice that Monte-Carlo is very slow on this 
large graph for small target set sizes, but gets faster as the 

^Note that Monte-Carlo was too slow to finish in a rea¬ 
sonable amount of time, so we measured the average time 
required to take 10 million walks, then multiplied by the 
number of walks needed. When measuring precision, we 
simulated the target weights Monte-Carlo would generate, 
by sampling ti with probability 7 rs(ti); this produces exactly 
the same distribution of weights as Monte-Carlo would. 
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(a) Running Time, More Precomputation 


(b) Running Time, Less Precomputation 


Figure 3: Running time on Twitter-2010 (1.5 billion edges) on log-scale, with parameters chosen such that 
the Precision@3 of our algorithms exceeds 90% and exceeds the precision@3 of Monte-Carlo. The two plots 
demonstrate the storage-runtime tradeoff: Figure 3(a) (which performs lOK walks at runtime) uses more 
pre-computation and storage compared to Figure 3(b) (with lOOiL walks). 


size of the target set increases. For example when \T\ = 10 
Monte Carlo takes half an hour, and even for |T| = 1000 it 
takes more than a minute. Bidirectional-PPR is fast for 
small T, but slow for larger T, taking more than a second 
when \T\ > 100. In contrast, BiPPR-Precomp-Grouped and 
BiPPR-Precomp-Sampling are both fast for all sizes of T, 
taking less than 250 ms when w = 10, 000 and less than 25 
ms when w = 100, 000. 

The improved running time of BiPPR-Precomp-Grouped 
and BiPPR-Precomp-Sampling, however, comes at the cost of 
pre-computation and storage. With these parameter choices, 
for w = 10, 000 the pre-computation size per target set in 
our experiments ranged from 8 MB (for \T\ = 10) to 200MB 
(for \T\ = 1000) per keyword. For w = 100,000, the storage 
per keyword ranges from 3 MB (for \T\ = 10) to 30MB (for 
|T| = 10,000). 

To get a larger range of |T| relative to |y|, we also perform 
experiments on the Pokec graph [23] which has 1.6 million 
nodes and 30 million edges. Figure 4 shows the results on 
Pokec for w = 100,000. Here we clearly see the cross-over 
point where Monte-Carlo becomes more efficient than Bi- 
directional-PPR, while BiPPR-Precomp-Grouped and BiPPR- 
Precomp-Sampling consistently take less than 250 millisec¬ 
onds. On Pokec, the storage used ranges from 800KB for 
|T| = 10 to 3MB for |T| = 10,000. 

We implement our algorithms in Scala and report run¬ 
ning times for Scala, but in preliminary experiments BiPPR- 
Precomp-Grouped is 3x faster when re-implemented in C++, 
we expect the running time would improve comparably for 
all five algorithms. Also, we ran each experiment on a sin¬ 
gle thread, but the algorithms parallelize naturally, so the 
latency could be improved by a multi-threaded implemen¬ 
tation. We ran our experiments on a machine with a 3.33 
GHz 12-core Intel Xeon X5680 processor, 12MB cache, and 
192 GB of 1066 MHz Registered ECC DDR3 RAM. We mea¬ 
sured the running time of the tread running each experiment 
to exclude garbage collector time. We loaded the graph used 



Figure 4: Running time on Pokec (30 million edges) 
performing lOOK walks at runtime. Notice that 
Monte-Carlo is slow for small |T|, Bidirectional- 
PPR is slow for large |T|, and BiPPR-Precomp-Grouped 
and BiPPR-Precomp-Sampling are fast across the entire 
range of |T|. 


































into memory and completed any pre-computation in RAM 
before measuring the running time of the algorithms. 

5. RELATED WORK 

Prior work on PPR Estimation The Bidirectional- 

PPR algorithm introduced in the first half of this work builds 
on the FAST-PPR algorithm presented in [19] - for details 
of prior work on Personalized PageRank estimation, see the 
section on existing approaches in [19]. Although FAST-PPR 
was the first algorithm for PPR estimation with sublinear 
running-time guarantees, it has several drawbacks which are 
improved upon by our new Bidirectional-PPR algorithm: 

• Bidirectional-PPR has a simple linear structure which 
enables searching; Eqn. 4 shows that the estimates are 
a dot-produce between a forward vector and a reverse 
vector . In contrast, estimates in [19] require monitoring 
each walk to detect collisions with a “frontier” set. 

• Bidirectional-PPR is 3x-8x faster than FAST-PPR for the 
same accuracy in experiments on diverse networks. 

• Bidirectional-PPR is cleaner and more elegant, leading 
to simpler correctness proofs and performance analysis. 
This also makes it easier to generalize to arbitrary Markov 
Chains, as done in [ 6 ]. 

Comparison to Partitioned Multi-Indexing For per¬ 
sonalized search, our indexing scheme is partially inspired 
by the Partitioned Multi-Indexing (PMI) scheme of Bah- 
mani et al. [4]. Similar to our methods, PMI uses a bidirec¬ 
tional approach to rank search results according to shortest 
path distance from the searching user. Shortest path is eas¬ 
ier to estimate than PPR, due to the fact that shortest path 
is a metric; moreover, shortest path is believed to be a less 
effective way of ranking search results than PPR. From a 
technical point of view, PMI is based on ‘sweeping’ from 
closer to more distant targets based on a distance oracle; in 
contrast, we use sampling to find the most relevant targets. 
Prior work on Personalized PageRank Search In [7], 
Berkhin builds upon the previous work [14] and proposes 
efficient ways to compute the personalized PageRank vec¬ 
tor TVs at runtime by combining pre-computed PPR vectors 
in a query-specific way. In particular, they identify “hub” 
nodes in advance, using heuristics such as global PageRank, 
and precompute approximate PPR vectors tvh for each hub 
node using a local forward-push algorithm called the Book¬ 
mark Coloring Algorithm (BCA). Chakrabarti [ 8 ] proposes 
a variant of this approach, where Monte-Carlo is used to 
pre-compute the hub vectors tth rather than BCA. 

Both approaches differ from our work in that they con¬ 
struct complete approximations to tTs, then pick out entries 
relevant to the query. This requires a high-accuracy esti¬ 
mate for TVs even though only a few entries are important. 
In contrast, our bidirectional approach allows us compute 
only the entries 7rs{ti) relevant to the query. 
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Figure 5: Median precision@3 for the search algo¬ 
rithms we compare. Notice that the Precision@3 of 
our algorithms exceeds 90% and exceeds the preci- 
sion@3 of Monte-Carlo. 

APPENDIX 

A. MORE EXPERIMENT PLOTS 

In Figure 5, we plot the Precision@3 for several search 
algorithms on Twitter-2010 using the same paramters as 
the experiments that used w = 100, 000. Note that BiPPR- 
Precomp and BiPPR-Precomp-Grouped compute the same es¬ 
timates, and these estimates are similar to those of Bi- 
directional-PPR, so we plot a single line for their accuracy. 












